AITopics | short video clips

Collaborating Authors

short video clips

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

REGen: Multimodal Retrieval-Embedded Generation for Long-to-Short Video Editing

Neural Information Processing SystemsJun-11-2026, 23:45:51 GMT

Short videos are an effective tool for promoting contents and improving knowledge accessibility. While existing extractive video summarization methods struggle to produce a coherent narrative, existing abstractive methods cannot `quote' from the input videos, i.e., inserting short video clips in their outputs. In this work, we explore novel video editing models for generating shorts that feature a coherent narrative with embedded video insertions extracted from a long input video. We propose a novel retrieval-embedded generation framework that allows a large language model to quote multimodal resources while maintaining a coherent narrative. Our proposed REGen system first generates the output story script with quote placeholders using a finetuned large language model, and then uses a novel retrieval model to replace the quote placeholders by selecting a video clip that best supports the narrative from a pool of candidate quotable video clips. We examine the proposed method on the task of documentary teaser generation, where short interview insertions are commonly used to support the narrative of a documentary. Our objective evaluations show that the proposed method can effectively insert short video clips while maintaining a coherent narrative. In a subjective survey, we show that our proposed method outperforms existing abstractive and extractive approaches in terms of coherence, alignment, and realism in teaser generation.

artificial intelligence, large language model, natural language, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

Seeing Beyond Frames: Zero-Shot Pedestrian Intention Prediction with Raw Temporal Video and Multimodal Cues

Zambare, Pallavi, Thanikella, Venkata Nikhil, Liu, Ying

arXiv.org Artificial IntelligenceJul-30-2025

Pedestrian intention prediction is essential for autonomous driving in complex urban environments. Conventional approaches depend on supervised learning over frame sequences and require extensive retraining to adapt to new scenarios. Here, we introduce BF-PIP (Beyond Frames Pedestrian Intention Prediction), a zero-shot approach built upon Gemini 2.5 Pro. It infers crossing intentions directly from short, continuous video clips enriched with structured JAAD metadata. In contrast to GPT-4V based methods that operate on discrete frames, BF-PIP processes uninterrupted temporal clips. It also incorporates bounding-box annotations and ego-vehicle speed via specialized multimodal prompts. Without any additional training, BF-PIP achieves 73% prediction accuracy, outperforming a GPT-4V baseline by 18 %. These findings illustrate that combining temporal video inputs with contextual cues enhances spatiotemporal perception and improves intent inference under ambiguous conditions. This approach paves the way for agile, retraining-free perception module in intelligent transportation system.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.21161

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.67)
Transportation > Ground > Road (0.38)
Transportation > Infrastructure & Services (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Meta unveils artificial intelligence-generated video

#artificialintelligenceSep-30-2022, 17:00:18 GMT

Meta announced that it was taking artificial intelligence-generated art to the next level by allowing users to create short video clips by just typing in a string of descriptive statements. Meta's AI division announced Thursday that it was unveiling Make-a-Video, an AI system that allows users to turn text prompts into short video clips of whatever was described. "Generative AI research is pushing creative expression forward by giving people tools to quickly and easily create new content," Meta said in a post describing the new technology. "With just a few words or lines of text, Make-A-Video can bring imagination to life and create one-of-a-kind videos full of vivid colors, characters, and landscapes." We're pleased to introduce Make-A-Video, our latest in #GenerativeAI research!

meta unveil artificial intelligence-generated video, short video clips, software, (6 more...)

#artificialintelligence

Country: North America > United States > Colorado (0.07)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.59)

Add feedback

Using synthetic data for deep learning video recognition

#artificialintelligenceJan-12-2018, 19:17:04 GMT

In recent years, deep learning has completely revolutionized the fields of computer vision, speech recognition and natural language processing. Despite breakthroughs in all three fields, one common barrier for training neural networks to solve real-world problems remains the amount of labeled training data that is required to train a model. In some domains, like video understanding, gathering real world data can be prohibitively expensive and time consuming in the absence of innovative solutions. At TwentyBN, we solved this problem by building an in-house data factory for generating high-quality videos for neural networks to learn about the real world. We instruct crowd workers to record short video clips based on carefully predefined and highly specific descriptions.

artificial intelligence, machine learning, video data, (6 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.75)

Add feedback

DeepMind AI Teaches Itself About the World by Watching Videos

#artificialintelligenceAug-22-2017, 22:30:12 GMT

A new artificial intelligence system teaches itself to recognize a range of visual and audio concepts by watching short video clips. Researchers at Google's DeepMind unit have developed an artificial intelligence (AI) system that teaches itself to recognize a range of visual and audio concepts by watching short video clips. For example, the new system can understand the concept of lawn mowing, even when it has not learned the words to describe what it is hearing or seeing. "We want to build machines that continuously learn about their environment in an autonomous manner," says University of California, Berkeley researcher Pulkit Agrawal. He notes the DeepMind project brings the field one step closer to the goal of creating AI that can teach itself by watching and listening to the world around it.

large language model, machine learning, natural language, (4 more...)

#artificialintelligence

Country:

North America > United States > California > Alameda County > Berkeley (0.29)
North America > United States > Maryland > Montgomery County > Bethesda (0.09)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

Google Motion Stills turns your Live Photos into GIFs: Free iOS app now makes it easier to create looping animations

Daily Mail - Science & techJun-8-2016, 11:31:54 GMT

Despite their popularity, creating GIFS can still be a tricky process. But a new app from Google, called'Motion Stills', will allow you to easily create the moving images in just a few clicks. The app takes Live Photos, several frames automatically captured before and after you hit the camera app's shutter button, and turns them into GIFs or short video clips. 'We use our video stabilization technology to freeze the background into a still photo or create sweeping cinematic pans,' Ken Conley and Matthias Grundmann from the Google Research Machine Perception team said in a blog post. 'The resulting looping GIFs and movies come alive, and can easily be shared via messaging or on social media.'

artificial intelligence, live photo, machine learning, (18 more...)

Daily Mail - Science & tech

Industry: Information Technology > Services (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback